Goto

Collaborating Authors

 educational data mining


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper proposes a probabilistic approach for learning the assignment of exercises to skills from student data, where student knowledge changes while exercises are being solved; the model also estimates the student knowledge while estimating the skill assignments. The paper uses a weighted CRP to model the assignment, incorporating expert labelings through the weighting. In simulation, the method recovers skill labelings with high accuracy, with little dependence on the expert labels, and across several datasets, the paper finds that skill labelings from this method result in higher prediction accuracy than other approaches. Overall, I found the paper to be clear and the proposed model is a relatively novel extension of existing methods.


PIIvot: A Lightweight NLP Anonymization Framework for Question-Anchored Tutoring Dialogues

Zent, Matthew, Smith, Digory, Woodhead, Simon

arXiv.org Artificial Intelligence

Personally identifiable information (PII) anonymization is a high-stakes task that poses a barrier to many open-science data sharing initiatives. While PII identification has made large strides in recent years, in practice, error thresholds and the recall/precision trade-off still limit the uptake of these anonymization pipelines. We present PIIvot, a lighter-weight framework for PII anonymization that leverages knowledge of the data context to simplify the PII detection problem. To demonstrate its effectiveness, we also contribute QATD-2k, the largest open-source real-world tutoring dataset of its kind, to support the demand for quality educational dialogue data.


Improving the portability of predicting students performance models by using ontologies

Zambrano, Javier Lopez, Lara, Juan A., Romero, Cristobal

arXiv.org Artificial Intelligence

One of the main current challenges in Educational Data Mining and Learning Analytics is the portability or transferability of predictive models obtained for a particular course so that they can be applied to other different courses. To handle this challenge, one of the foremost problems is the models excessive dependence on the low-level attributes used to train them, which reduces the models portability. To solve this issue, the use of high level attributes with more semantic meaning, such as ontologies, may be very useful. Along this line, we propose the utilization of an ontology that uses a taxonomy of actions that summarises students interactions with the Moodle learning management system. We compare the results of this proposed approach against our previous results when we used low-level raw attributes obtained directly from Moodle logs. The results indicate that the use of the proposed ontology improves the portability of the models in terms of predictive accuracy. The main contribution of this paper is to show that the ontological models obtained in one source course can be applied to other different target courses with similar usage levels without losing prediction accuracy.


Combining Cognitive and Generative AI for Self-explanation in Interactive AI Agents

Sushri, Shalini, Dass, Rahul, Basappa, Rhea, Lu, Hong, Goel, Ashok

arXiv.org Artificial Intelligence

The Virtual Experimental Research Assistant (VERA) is an inquiry-based learning environment that empowers a learner to build conceptual models of complex ecological systems and experiment with agent-based simulations of the models. This study investigates the convergence of cognitive AI and generative AI for self-explanation in interactive AI agents such as VERA. From a cognitive AI viewpoint, we endow VERA with a functional model of its own design, knowledge, and reasoning represented in the Task--Method--Knowledge (TMK) language. From the perspective of generative AI, we use ChatGPT, LangChain, and Chain-of-Thought to answer user questions based on the VERA TMK model. Thus, we combine cognitive and generative AI to generate explanations about how VERA works and produces its answers. The preliminary evaluation of the generation of explanations in VERA on a bank of 66 questions derived from earlier work appears promising.


Evaluating Algorithmic Bias in Models for Predicting Academic Performance of Filipino Students

Švábenský, Valdemar, Verger, Mélina, Rodrigo, Maria Mercedes T., Monterozo, Clarence James G., Baker, Ryan S., Saavedra, Miguel Zenon Nicanor Lerias, Lallé, Sébastien, Shimada, Atsushi

arXiv.org Artificial Intelligence

Algorithmic bias is a major issue in machine learning models in educational contexts. However, it has not yet been studied thoroughly in Asian learning contexts, and only limited work has considered algorithmic bias based on regional (sub-national) background. As a step towards addressing this gap, this paper examines the population of 5,986 students at a large university in the Philippines, investigating algorithmic bias based on students' regional background. The university used the Canvas learning management system (LMS) in its online courses across a broad range of domains. Over the period of three semesters, we collected 48.7 million log records of the students' activity in Canvas. We used these logs to train binary classification models that predict student grades from the LMS activity. The best-performing model reached AUC of 0.75 and weighted F1-score of 0.79. Subsequently, we examined the data for bias based on students' region. Evaluation using three metrics: AUC, weighted F1-score, and MADD showed consistent results across all demographic groups. Thus, no unfairness was observed against a particular student group in the grade predictions.


Exploring Fairness in Educational Data Mining in the Context of the Right to be Forgotten

Qian, Wei, Chen, Aobo, Zhao, Chenxu, Li, Yangyi, Huai, Mengdi

arXiv.org Artificial Intelligence

Student data, which is a critical component in EDM research, can contain personal information, such as age and gender, as well as academic performance and activity data from online learning systems [24]. By offering valuable insights into student learning, EDM supports the development of more effective educational practices and policies, ultimately improving student outcomes. One of the most popular techniques in the previous works is incorporating machine learning techniques, which has achieved remarkable success in discovering intricate structures within educational datasets. However, in recent years, concerns about the fairness of deploying algorithmic decision-making in the educational context have emerged [2, 22, 27, 49]. Particularly, machine learning models can produce biased and unfair outcomes for certain student groups, significantly affecting their educational opportunities and achievements. Given that the data empowering EDM research can often contain personally identifiable and other sensitive information, there has been increased attention to privacy protection in recent years [37, 43]. Additionally, privacy legislation such as the California Consumer Privacy Act [39] and the former Right to be Forgotten [17] has granted users the right to erase the impact of their sensitive information from the trained models to protect their privacy. One approach to protecting users' privacy involves enabling the trained machine learning model to entirely forget Both authors contributed equally to this research.


Deep Learning for Educational Data Science

Pinto, Juan D., Paquette, Luc

arXiv.org Artificial Intelligence

As artificial intelligence (AI) continues to penetrate ever deeper into modern life, one particular family of machine learning algorithms--namely, deep neural networks--have come to be seen as the solution to many of the challenges that have stumped more classical algorithms in the past. Modeled loosely on the structure of biological neural networks, artificial neural networks consist of chains of simple mathematical transformations that can model complex non-linear decision boundaries in large problem spaces. In particular, deep neural networks--artificial neural networks that consist of multiple layers of transformations--allow for sufficient complexity to tackle tasks in a wide variety of fields. These models are collectively and more colloquially referred to as deep learning. A growing body of education researchers are now also turning their attention to leveraging the power of deep learning algorithms for the tasks of improving and understanding human learning. Researchers in educational data science, a field consisting of various interrelated research communities such as Educational Data Mining (EDM), Learning Analytics (LA), and AI in Education (AIED), have been involved in this endeavor.


Deep Knowledge Tracing is an implicit dynamic multidimensional item response theory model

Vie, Jill-Jênn, Kashima, Hisashi

arXiv.org Artificial Intelligence

Knowledge tracing consists in predicting the performance of some students on new questions given their performance on previous questions, and can be a prior step to optimizing assessment and learning. Deep knowledge tracing (DKT) is a competitive model for knowledge tracing relying on recurrent neural networks, even if some simpler models may match its performance. However, little is known about why DKT works so well. In this paper, we frame deep knowledge tracing as a encoderdecoder architecture. This viewpoint not only allows us to propose better models in terms of performance, simplicity or expressivity but also opens up promising avenues for future research directions. In particular, we show on several small and large datasets that a simpler decoder, with possibly fewer parameters than the one used by DKT, can predict student performance better.


A review of clustering models in educational data science towards fairness-aware learning

Quy, Tai Le, Friege, Gunnar, Ntoutsi, Eirini

arXiv.org Artificial Intelligence

Ensuring fairness is essential for every education system. Machine learning is increasingly supporting the education system and educational data science (EDS) domain, from decision support to educational activities and learning analytics. However, the machine learning-based decisions can be biased because the algorithms may generate the results based on students' protected attributes such as race or gender. Clustering is an important machine learning technique to explore student data in order to support the decision-maker, as well as support educational activities, such as group assignments. Therefore, ensuring high-quality clustering models along with satisfying fairness constraints are important requirements. This chapter comprehensively surveys clustering models and their fairness in EDS. We especially focus on investigating the fair clustering models applied in educational activities. It is believed that these models are practical tools for analyzing students' data and ensuring fairness in EDS.


Demonstrating REACT: a Real-time Educational AI-powered Classroom Tool

Kulkarni, Ajay, Gkountouna, Olga

arXiv.org Artificial Intelligence

We present a demonstration of REACT, a new Real-time Educational AI-powered Classroom Tool that employs EDM techniques for supporting the decision-making process of educators. REACT is a data-driven tool with a user-friendly graphical interface. It analyzes students' performance data and provides context-based alerts as well as recommendations to educators for course planning. Furthermore, it incorporates model-agnostic explanations for bringing explainability and interpretability in the process of decision making. This paper demonstrates a use case scenario of our proposed tool using a real-world dataset and presents the design of its architecture and user interface. This demonstration focuses on the agglomerative clustering of students based on their performance (i.e., incorrect responses and hints used) during an in-class activity. This formation of clusters of students with similar strengths and weaknesses may help educators to improve their course planning by identifying at-risk students, forming study groups, or encouraging tutoring between students of different strengths.